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Abstract 

The  goal  of  Byzantine  Broadcast  (BB)  is  to  allow  a  set  of  fault-free  nodes  to  agree  on 
information  that  a  source  node  wants  to  broadcast  to  them,  in  the  presence  of  Byzantine  faulty 
nodes.  We  consider  design  of  efficient  algorithms  for  BB  in  point-to-point  networks  where  the 
rate  of  transmission  over  each  communication  link  is  limited  by  its  "link  capacity".  Given  an 
algorithm  A  to  solve  BB  in  a  network  G,  let  us  denote  by  t(G,  L,  A)  the  worst-case  execution 
time  of  A  without  violating  link  capacity  constraints  in  G,  when  L  is  the  size  of  the  input  at  the 
source  node.  Then,  we  define  the  capacity  of  BB  in  network  G  as  the  sup  remum  of  L/f(G,  L,A) 
over  all  L  and  all  possible  BB  algorithms  A. 

We  prove  upper  bounds  on  the  capacity  of  Byzantine  broadcast  over  arbitrary  point-to-point 
networks.  An  algorithm  is  then  given  that  solves  BB  at  a  rate  of  at  least  1/2  or  1/3  of  the  capacity, 
depending  on  different  conditions  the  underlying  network  satisfies.  This  Byzantine  Broadcast 
algorithm  tolerates  up  to  /  faulty  nodes  as  long  as  the  the  total  number  of  nodes  is  at  least  3/  + 1 
and  the  connectivity  is  at  least  2/  +  1. 

To  the  best  of  our  knowledge,  ours  is  the  first  algorithm  that  achieves  a  constant  fraction  of 
capacity  of  BB  in  general  point-to-point  networks. 
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1  Introduction 

The  problem  of  Byzantine  Broadcast  -  also  known  as  the  Byzantine  Generals  problem  [14]  - 
was  introduced  by  Pease,  Shostak  and  Lamport  in  their  1980  paper  [23].  Since  the  first  paper  on 
this  topic,  Byzantine  Broadcast  has  been  the  subject  of  intense  research  activity,  due  to  its  many 
potential  practical  applications,  including  replicated  fault-tolerant  state  machines  [5],  and  fault- 
tolerant  distributed  file  storage  [25].  Informally,  Byzantine  Broadcast  (BB)  can  be  described  as 
follows  (we  will  define  the  problem  more  formally  later).  There  is  a  source  node  that  needs  to 
broadcast  a  message  (also  called  its  input )  to  all  the  other  nodes  such  that  even  if  some  of  the  nodes 
are  Byzantine  faulty,  all  the  fault-free  nodes  will  still  be  able  to  agree  on  an  identical  message;  the 
agreed  message  is  identical  to  the  source's  input  if  the  source  is  fault-free. 

We  consider  the  problem  of  maximizing  the  throughput  of  Byzantine  Broadcast  (BB)  in  net¬ 
works  of  point-to-point  links,  wherein  each  directed  communication  link  is  subject  to  a  "capacity" 
constraint.  Informally  speaking,  throughput  of  BB  is  the  number  of  bits  of  Byzantine  Broadcast  that 
can  be  achieved  per  unit  time  (on  average),  under  the  worst-case  behavior  by  the  faulty  nodes. 
Despite  the  large  body  of  work  on  BB  [9,  6,  3,  13,  2,  22],  performance  of  BB  in  general  point-to- 
point  work  has  not  been  investigated  previously.  In  reality,  link  capacities  may  differ  significantly. 
Existing  algorithms  often  do  not  perform  well  under  such  realistic  conditions.  In  fact,  one  can 
easily  construct  example  networks  in  which  previously  proposed  algorithms  achieve  throughput 
that  is  arbitrarily  worse  than  the  optimal.  In  our  prior  work,  we  have  developed  an  algorithm  that 
optimizes  the  throughput  in  4-node  point-to-point  networks  [18],  and  also  an  optimal  algorithm 
when  total  communication  overhead  is  the  cost  metric  [19].  In  contrast,  this  paper  presents  a  BB 
algorithm  for  arbitrary  point-to-point  networks.  The  paper  makes  two  main  contributions: 

1.  We  prove  upper  bounds  on  the  capacity  of  BB  in  point-to-point  networks  wherein  each 
directed  communication  link  is  subject  to  a  capacity  constraint. 

2.  We  present  the  first  BB  algorithm  that  achieves  a  constant  fraction  (1/2  or  1/3)  of  the  capacity 
in  arbitrary  point-to-point  networks. 

2  Preliminaries 

2.1  The  Byzantine  Broadcast  (BB)  Problem 

Byzantine  Broadcast  is  an  example  of  a  class  of  problems  known  as  Byzantine  Agreement.  The 
BB  problem  considers  a  network  of  n  nodes,  named  1, 2,  •  •  •  , n,  with  one  node  designated  as  the 
sender  or  source,  and  the  other  nodes  designated  as  the  peers.  In  our  discussion,  we  will  assume 
that  node  1  is  the  source  node.  Source  node  1  is  given  an  input  value  x  containing  L  bits,  and  the 
goal  here  is  for  the  source  to  broadcast  its  input  to  all  the  other  nodes.  The  fault-free  nodes  must 
"agree  on"  the  input  value  being  broadcast  by  the  source,  despite  the  possibility  that  some  of  the 
nodes  may  be  faulty  (possibly  including  the  source).  In  particular,  the  following  conditions  must 
be  satisfied  when  the  input  value  at  the  source  node  is  x: 

•  Termination:  Every  fault-free  node  i  must  eventually  decide  on  an  output  value;  let  us 
denote  the  output  value  of  fault-free  node  i  as  x'. 

•  Agreement:  All  fault-free  nodes  must  agree  on  an  identical  output  value,  i.e.,  there  exists  x' 
such  that  x'.  -  x'  for  each  fault-free  node  i. 

I 

•  Validity:  If  the  source  node  is  fault-free,  then  the  agreed  value  must  be  identical  to  the  input 
value  of  the  source,  i.e.,  x'  =  x. 
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Adversary  Model:  The  faulty  nodes  are  controlled  by  an  adversary  that  has  a  complete  knowl¬ 
edge  of  the  network  topology,  the  algorithm,  and  the  information  the  source  is  trying  to  send.  No 
secret  is  hidden  from  the  adversary.  The  adversary  can  take  over  up  to  /  nodes  at  any  point  during 
execution  of  the  algorithm,  where  /  <  n/3.  These  nodes  are  said  to  be  faulty.  The  faulty  nodes  can 
engage  in  any  kind  of  deviations  from  the  algorithm,  including  sending  false  messages,  collusion, 
and  crash  failures. 

2.2  Network  Model 

We  assume  a  synchronous  point-to-point  network  modeled  as  a  directed  simple  graph  G( V,  E), 
where  the  set  of  vertices  V  =  {1,2,  ••  •  ,n}  represents  the  nodes  in  the  point-to-point  network,  and 
the  set  of  edges  E  represents  the  links  in  the  network.  With  a  slight  abuse  of  terminology,  we 
will  use  the  terms  edge  and  link  interchangeably.  We  assume  that  n  >  3/  +  1  and  that  the  network 
connectivity  is  at  least  2/  +  1  (these  two  conditions  are  necessary  for  existence  of  a  correct  BB 
algorithm  [9]). 

In  the  given  network,  links  may  not  exist  between  all  node  pairs.  Each  directed  link  is  asso¬ 
ciated  with  a  fixed  link  capacity,  which  specifies  the  maximum  amount  of  information  that  can  be 
transmitted  on  that  link  per  unit  time.  Specifically,  over  a  directed  link  (i,  j)  with  capacity  z  bits/unit 
time,  we  assume  that  up  to  zt  bits  can  be  reliably  sent  from  node  i  to  node  j  over  time  duration 
t  (for  any  non-negative  t).  This  is  a  deterministic  model  of  channel  capacity  that  has  been  widely 
used  in  the  network  coding  literature  [15,  4, 11, 12],  All  link  capacities  are  assumed  to  be  integers. 
Rational  link  capacities  can  be  turned  into  integers  by  choosing  a  suitable  time  unit.  Irrational 
link  capacities  can  be  approximated  by  integers  with  arbitrary  accuracy  by  choosing  a  suitably 
time  unit.  Propagation  delays  on  the  links  are  assumed  to  be  zero  (relaxing  this  assumption  does 
not  impact  the  correctness  of  results  shown  for  large  input  sizes).  We  also  assume  that  each  node 
correctly  knows  the  identity  of  the  nodes  at  the  other  end  of  its  links. 

2.3  Multigraph  Representations  and  MinCuts 

In  our  discussion  below,  we  will  find  it  convenient  to  interpret  a  point-to-point  network 
G  as  a  directed  multigraph  with  unit  capacity  edges.  Thus,  an  edge  e  =  (/',;')  of  capacity  z 
will  now  be  modeled  using  z  directed  unit-capacity  edges  from  node  i  to  node  j.  Additionally, 
by  ignoring  the  directions  of  edges  in  this  multigraph,  we  obtain  an  undirected  multigraph 
representation  of  the  network,  denoted  as  G.  Figures  1(a)  to  1(c)  illustrate  examples  of  different 
representations  of  a  graph.  Define  MINCUT(G,i,  j)  as  the  directed  mincut  in  directed  graph  G 
from  node  i  to  node  j.  Similarly,  for  the  undirected  representation,  define  MINCUT(G,i,  j)  as 
the  undirected  mincut  in  undirected  graph  G  between  node  i  and  node  j.  Note  that,  while  we 
always  have  MINCUT(G,  i,  j)  =  MINCUT(G,  j,  i),  in  general  MINCUT(G,  i,  j )  may  not  be  equal  to 
MINCUT(G,  j,  i).  We  also  define  the  following  notations  for  later  use: 

•  D(G,  i)  =  min jev,j*i  MINCUT(G,i,  j):  the  minimum  directed  cut  from  node  i  to  any  of  the 
other  nodes  in  G.  It  is  well-known  that  D(G,  i)  is  the  broadcast  capacity  from  node  i  in  G  [7] 
-  broadcast  capacity  from  node  i  characterizes  the  maximum  rate  at  which  node  i  can  deliver 
information  to  all  the  other  nodes  in  the  network  (in  the  absence  of  any  failures). 

•  U(G)  =  min irjev,i*j  MINCUT(G,  i,  j):  the  minimum  undirected  cut  between  any  pair  of  nodes 
in  the  undirected  multigraph  representation  G  of  graph  G. 

In  the  example  of  Figure  1,  MINCUT(G,  1, 4)  =  1,  MINCUT(G,  1, 4)  =  3,  D(G,  1)  =  1,  and  U(G)  =  2. 
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(a)  Directed  Simple  Graph  G 


(c)  Undirected  Multigraph  G,  every  undirected 
edge  has  capacity  1 


(b)  Directed  Multigraph  representation  of  G,  ev¬ 
ery  directed  edge  has  capacity  1 


G 


Figure  1:  Graph  representations  of  the  network 


2.4  Capacity  of  Byzantine  Broadcast 

We  first  define  the  throughput  of  a  given  algorithm  in  a  given  network.  For  any  algorithm  A 
that  solves  BB  with  an  L-bit  input  at  the  source  in  network  G,  t(G,  L,  P)  is  defined  as  the  duration  of 
time  required,  in  the  worst  case,  from  the  initiation  of  the  algorithm  until  its  termination,  without 
violating  the  capacity  constraints  of  the  links  in  G.  Throughput  of  algorithm  A  is  then  defined  as 
jfQj-jij-  We  define  capacity  of  BB  as  follows. 


Capacity  Cgg  of  Byzantine  Broadcast  in  graph  G  is  defined  as  the  supremum  of  the  throughput  of 
all  algorithms  that  solve  the  BB  problem.  That  is. 


Cbb(G) 


sup 

A  solves  BB  in  G 


L 

t(G,  L,A)‘ 


(1) 


3  Algorithm  Overview 

The  goal  of  the  proposed  BB  algorithm  is  to  perform  Byzantine  Broadcast  of  an  L-bit  input. 
The  algorithm  is  designed  to  perform  efficiently  for  large  L.  We  first  briefly  describe  the  salient 
features  of  the  BB  algorithm. 

Execution  in  Multiple  Generations:  To  improve  throughput,  BB  of  an  L-bit  value  is  performed 
"in  parts".  In  particular,  for  a  certain  integer  B,  the  L-bit  value  is  divided  into  L/B  parts,  each 
consisting  of  B  bits.  For  convenience,  we  assume  that  L/B  is  an  integer.  A  sub-algorithm  (called 
Algorithm  1  below)  is  used  to  perform  BB  on  each  of  these  B-bit  parts.  We  refer  to  each  execution 
of  the  Algorithm  1  as  a  "generation".  Algorithm  1  for  each  generation  has  the  following  structure: 

Algorithm  1  Structure  of  the  BB  algorithm  for  generation  g 

1.  Broadcast  with  failure  detection:  This  phase  of  the  algorithm  allows  the  source  node 
1  to  broadcast  B  bits  of  generation  g.  This  phase  also  performs  failure  detection,  while 
satisfying  the  following  conditions:  (i)  each  fault-free  nodes  agrees  on  whether  a  failure  has 
been  detected  or  not;  (ii)  if  a  failure  is  not  detected,  then  the  the  broadcast  values  received 
by  the  fault-free  nodes  in  generation  g  satisfy  the  agreement  and  validity  conditions  stated 
in  Section  2.1,  and  (iii)  if  a  failure  is  detected,  then  at  least  one  faulty  node  has  adversely 
affected  the  execution  of  generation  g. 

2.  Fault  Diagnosis:  Whenever  a  failure  is  detected  above,  additional  steps  are  taken  to  learn 
(possibly  partial)  information  regarding  the  identity  of  the  faulty  node(s).  This  technique 
is  also  known  as  "dispute  control"  [1],  and  has  been  used  in  our  previous  work  [18,  19]. 
The  Fault  Diagnosis  phase  identifies  a  pair  of  adjacent  nodes,  of  which  at  least  one  node  is 
guaranteed  to  be  faulty.  The  directed  links  between  these  two  nodes  are  not  used  in  the  future 
generations  of  Algorithm  1  (this  is  how  the  algorithm  adapts  to  past  failure  detections).  It 
can  be  shown  that  [18, 19],  if  a  certain  node  is  identified  as  potentially  faulty  in  at  least  /  +  1 
generations,  then  that  node  must  necessarily  be  faulty  (and,  therefore,  excluded  from  the 
execution  of  future  generations).  Thus,  the  total  number  of  generations  during  which  faulty 
behavior  is  detected  is  limited  by  /(/  +  1). 


Long-Term  Throughput:  Due  to  the  bounded  number  of  generations  in  which  the  faulty  nodes 
can  misbehave,  it  turns  out  that  the  throughput  of  the  Byzantine  Broadcast  algorithm  (which 
tolerates  up  to  /  Byzantine  faults)  can  be  made  arbitrarily  close  to  the  throughput  at  which  we 
can  perform  Broadcast  with  failure  detection  of  Algorithm  1  above.  For  lack  of  space,  we 
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omit  the  proof  of  this  claim.  The  proof  follows  from  our  prior  work  on  throughput  of  Byzantine 
Broadcast  [18]. 

Due  to  the  above  reason,  we  will  now  focus  our  attention  on  the  problem  of  performing 
Broadcast  with  Failure  Detection  and  propose  an  algorithm  for  this.  Our  algorithm  will  make  use 
of  a  new  solution  to  the  Multi-Party  Equality  (MEQ)  problem  as  a  sub-algorithm.  In  Section  4  we 
discuss  our  algorithm  for  MEQ,  and  later  describe  how  that  can  be  used  to  solve  the  broadcast  with 
failure  detection  problem. 

4  The  Multiparty  Equality  (MEQ)  Problem 

Let  us  consider  the  MEQ  problem  for  the  m  nodes  in  a  network  QfV,  £).  Thus,  *V  is  the  set  of 
nodes  in  the  network,  and  £  is  the  set  of  directed  edges  in  Q.  As  we  will  see  later,  our  algorithm 
for  BB  in  network  G  will  apply  the  MEQ  algorithm  to  various  sub-graphs  of  G. 

For  the  MEQ  problem,  each  of  the  nodes  in  T'  starts  with  an  input,  and  the  objective  of  MEQ 
is  for  the  nodes  to  determine  whether  they  all  share  the  same  input.  The  MEQ  problem  does  not 
consider  faulty  behavior.  Formally,  requirements  of  our  version  of  MEQ  are  as  follows: 

•  Each  node  i  is  given  an  input  Xj  consisting  of  B  bits. 

•  Each  node  sets  an  output  bit  to  be  0  or  1. 

•  If  Xi  =  ■  ■  ■  xm,  then  all  the  nodes  set  output  to  0;  otherwise,  at  least  one  node  outputs  1. 

We  can  define  throughput,  and  capacity  C^eq,  of  MEQ  on  network  Q  analogous  to  throughput  and 
capacity  Cbb  of  Byzantine  Broadcast.  For  lack  of  space,  we  do  not  repeat  these  definitions  here. 

Theorem  1  In  point-to-point  network  Q(fV ,  £),  the  capacity  of  the  MEQ  problem  is  upper  bounded  by  the 
minimum  undirected  cut,  that  is,  Cmeq(@)  ^  U(@). 

Theorem  1  is  a  natural  extension  of  the  well-known  result  on  the  communication  complexity 
of  the  two-party  equality  problem  [26].  The  proof  is  included  in  Appendix  A.  In  the  rest  of  this 
section,  we  discuss  our  algorithm  that  solves  MEQ  with  throughput  at  least  U(Q)/2  using  random 
linear  coding.  While  the  MEQ  problem  may  potentially  be  solved  using  other  algorithms,  the 
structure  of  our  MEQ  algorithm  is  designed  to  help  solve  the  BB  problem,  as  seen  later. 

We  refer  to  the  strategy  used  in  the  algorithms  below  as  "local  coding"  to  contrast  it  with 
traditional  network  coding  strategies  [11, 12],  Specifically,  in  our  algorithms,  as  will  be  seen  later, 
the  coding  scheme  does  not  combine  packets  from  different  sources. 

4.1  Solving  the  MEQ  Problem  along  Spanning  Trees 

We  first  present  an  algorithm  that  solves  MEQ  using  a  set  of  edge-disjoint  spanning  trees  in 
Q.  Given  the  undirected  multigraph  Q ,  define  R  as  the  "spanning  tree  packing  number"  of  Q, 
which  is  the  maximum  number  of  edge-disjoint  undirected  unit-capacity  spanning  trees  that  can 
be  "packed"  in  Q  [21],  Figure  1(d)  shows  two  spanning  trees  packed  in  the  graph  in  Figure  1(c).  It 
is  well-known  [16]  that 

U{@)/2  <  R  <  U(g). 

The  unit-capacity  edges  included  in  the  R  disjoint  spanning  trees  of  Q  will  be  used  in  Algorithm 
2  below  for  solving  MEQ  in  Q.  The  direction  in  which  each  such  edge  is  used  is  the  same  as  the 
direction  of  the  corresponding  edge  in  the  directed  multigraph  representation  of  Q.  For  instance, 
in  Figure  1(d),  edge  el  is  used  to  in  the  direction  from  node  1  to  node  4,  because  the  corresponding 
unit-capacity  directed  edge  in  Figure  1(b)  is  in  that  direction.  Similarly,  edge  e2  in  Figure  1(d)  is 
used  to  send  packets  from  node  4  to  node  1,  because  the  corresponding  edge  in  Figure  1(b)  is  in 
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Algorithm  2  MEQ  along  R  Spanning  Trees 

1.  For  each  unit-capacity  edge  e  =  (/',  j)  that  is  in  one  of  the  R  trees  in  Q ,  chooses  a  row  vector 
ce  =  \ce(\),ce{2),  -  ■  ■  , ce(R)  ]  (called  a  coefficient  vector)  containing  R  symbols,  each  chosen 
uniformly  randomly  from  GF(2B'R).  Node  i  transmits  to  node  j  symbol  ye  -  cexj,  which  is  a 
random  linear  combination  of  the  symbols  in  Xj.  ye  is  said  to  be  a  "coded  symbol". 

2.  At  each  node  j,  for  each  unit-capacity  link  e  =  (i,  j )  that  belongs  to  one  of  the  R  trees,  node  j 
checks  whether  ye  equals  cexT..  The  check  is  said  to  fail  if  ye  y  cexT..  (Recall  that  ye  is  cexj.) 

3.  At  each  node  j,  if  any  of  the  received  symbols  fails  the  check  in  step  2,  then  node  j  sets  its 
output  bit  to  1;  otherwise  node  j  sets  its  the  output  to  0. 


that  direction.  In  the  algorithm  below,  we  will  denote  the  links  (or  edges)  by  the  corresponding 
node  pair  listed  in  the  order  in  which  the  edge  can  be  used.  For  instance,  el  =  (1, 4)  and  e2  =  (4, 1). 

We  assume  that  B/R  is  an  integer,  and  represent  the  B-bit  input  x,  at  each  node  i  as  a  row  vector 
of  R  symbols  from  Galois  Field  GF( 2B^R):  [x,(  1),  •  •  •  ,x,(R)]. 

Suppose  that  Q  is  identical  to  graph  G  in  Figure  1.  In  this  case,  assuming  that  the  two  undirected 
spanning  trees  showed  in  Figure  1(d)  are  used,  node  1  sends  one  random  linear  combination  of  its 
input  symbols  to  node  4  using  the  directed  unit-capacity  edge  from  node  1  to  node  4,  as  specified  by 
the  spanning  tree  showed  in  dotted  lines.  Similarly  node  4  sends  one  random  linear  combination 
of  its  R/B  input  symbols  to  node  1,  using  the  edge  specified  by  the  spanning  tree  showed  in  solid 
line.  The  coefficients  in  steps  1  and  2,  although  randomly  chosen,  can  be  viewed  as  a  part  of  the 
algorithm  specification,  and  assumed  to  be  known  to  all  the  nodes  (i.e.,  the  coefficients  are  chosen 
at  algorithm  design  time,  not  at  runtime). 

4.2  Correctness  of  Algorithm  2 

Recall  that  m  is  the  number  of  nodes  in  Q.  Let  us  define  d,{r)  =  xfr)  -  xm(r)  as  the  difference 
between  X;  and  xm  at  the  r-th  symbol.  Step  2  of  Algorithm  2  determines,  for  link  e  =  (/',  j),  whether 

cexj  =  cexT.,  which  is  equivalent  to  checking  whether  ce(l)(d,(l)-d;(l))-f - \- ce(R)(dj(R)  -  d j(R))  =  0. 

Such  a  check  is  performed  for  each  of  the  ( m  -  1)  edges  in  each  of  the  R-trees,  resulting  in  (m  -  1  )R 
checks.  These  ( m  -  1)R  checks  together  can  be  represented  in  a  matrix  form  as 

Mr  dT  =  0,  (2) 

where  Mr  is  a  ( m  -  l)R-by-(/«  -  1)R  square  matrix  defined  by  the  elements  of  ce  vectors  for  all  links 
e  in  the  R  spanning  trees,  and  d  =  [di(l),  •  •  •  ,dm_i(l),di(2),  •  •  •  ,dm- 1(2),  •  •  •  ,di(R),  •  •  •  ,dm_i(R)]. 

In  Algorithm  2,  all  m  output  bits  are  set  to  0  if  and  only  if  MrcF  =  0.  According  to  the  definition 
of  the  MEQ  problem,  all  the  output  bits  should  be  0  if  and  only  if  X\  =  •  •  •  =  xm.  Also,  X\  =  •  •  •  =  xm 
if  and  only  if  d  =  0.  Finally,  if  Mr  is  invertible,  then  the  only  solution  of  Mr  d1  —  0  is  d  =  0.  It  then 
follows  that  Algorithm  2  is  correct  if  Mr  is  invertible. 

Theorem  2  When  the  coefficients  in  step  1  of  Algorithm  2  are  chosen  independently  and  uniformly  at 
random  from  GF(2B/,R),  matrix  Mr  is  invertible  zvith  probability  at  least  1  -  .  Thus,  for  large  enough 

B,  Algorithm  2  is  correct  with  a  non-zero  probability. 


Proof  Sketch:  Let  us  label  the  R  undirected  spanning  trees  as  T\,  ■  ■  ■  ,  Tr.  We  can  order  the  rows 
of  Mr  such  that  rows  (k  -  1  )(m  -  1)  +  1  to  k(m  -  1)  correspond  to  the  m  -  1  edges  on  spanning  tree 
T/c.  For  each  spanning  tree  T\,  assign  a  unique  index  from  1  to  m  -  1  to  its  m  -  1  edges.  Denote 
by  Cjy(r)  the  randomly  chosen  coefficient  for  the  r-th  input  symbol  used  for  generating  the  coded 
symbol  sent  on  the  Z-th  edge  of  tree  T^. 
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Define  Qv-  =  diag(c^i(r),  ■  ■  ■  , (r))  as  the  diagonal  matrix  with  the  diagonal  elements  filled 
with  the  r-th  coefficient  used  on  the  m  —  1  edges  on  tree  TV  Also,  define  (in  -  l)-by -(m  -  1)  matrix 
Afr  to  represent  the  m  —  1  edges  on  tree  T/c  as  follows:  (1)  if  the  Z-th  directed  edge  on  7\.  is  pointing 
from  node  Z  to  node  j,  then  the  Z-th  row  of  has  the  Z-th  element  as  -1  and  the  j-th  element  as  1, 
the  remaining  entries  in  the  Z-th  row  being  0;  (2)  if  i  =  m  then  j-th  element  of  the  Z-th  row  is  set  to 
1,  the  remaining  elements  of  that  row  being  0;  (3)  if  j  =  m  then  Z-th  element  of  the  Z-th  row  is  set 
to  — 1,  the  remaining  elements  of  that  row  being  0.  As  an  example,  the  spanning  tree  marked  by 
dotted  lines  in  Figure  1(d)  contains  three  edges  ( m  =  4).  Suppose  that  we  index  the  edges  from  1 

-1  1  O' 

to  3  in  the  following  order:  (1,2),  (1,4),  (4,3).  The  resulting  A  matrix  for  this  tree  is  -10  0. 

I  0  0  1  J 

Now  Mr  can  be  written  in  the  following  form: 

Ci,iAi  Ci^Ai  •  •  •  Ci^Ai 

C24A2  C22A2  •  •  •  C2,rA2 

Mr  =  .  .  ..  (3) 

v  Cr^Ar  Cr^Ar  ■  ■  ■  CR'RAr  / 

C14A1  •  •  •  Ci^-Ai 

Define  M/c  =  :  •  • .  :  for  1  <  k  <  R.  Note  that  M^  is  a  sub-matrix  of  when 

,  Cfc,i Ak  ■  ■  ■  , 

Zcl  <  Zc2.  We  can  show  by  induction  that,  with  probability  at  least  fl  -  //m)  ,  determinant  of  M/,  is 
non-zero.  Then  the  theorem  follows  by  setting  k  =  R.  The  detailed  proof  is  in  Appendix  B.  □ 
Theorem  2  implies  that,  for  sufficiently  large  £>,  there  exist  a  set  of  coefficient  vectors  that  make 
Mr  invertible.  Then  Algorithm  2  deterministically  solves  the  MEQ  problem  in  Q  with  this  set  of 
coefficient  vectors. 

Observe  that  Algorithm  2  requires  only  one  round  of  communication,  the  length  of  a  round 
being  equal  to  the  transmission  time  of  B/R  bits  (one  symbol  from  GF( 2B^R))  along  a  unit-capacity 
edge.  Without  loss  of  generality,  assume  that  the  unit  of  capacity  is  1  bit/time  unit.  Then  it  takes 
B/R  time  units  for  Algorithm  2  to  terminate,  and  its  throughput  is  =  R  bits/unit  time.  Recall 
that  R  >  U{Q)/2  and  Cmeq(3)  ^  kl{Q).  Thus,  the  throughput  of  Algorithm  2  is  at  least  Cmeq{Q)I 2- 

4.3  Solving  the  MEQ  Problem  without  Knowing  the  Spanning  Trees 

In  Algorithm  2,  we  have  assumed  that  the  R  undirected  unit-capacity  spanning  trees  are  known 
a  priori.  It  turns  out  that  knowing  the  actual  trees  is  not  necessary  -  knowing  the  value  R  suffices. 
Here  we  present  Algorithm  3  that  solves  the  MEQ  problem  with  parameter  p  <  R,  which  achieves 
throughput  p  without  knowing  any  spanning  tree.  For  this  algorithm,  the  input  value  at  each 
node  Z  is  represented  as  a  row  vector  of  p  symbols  from  GF(2li'l,)/  similar  as  in  Algorithm  2. 

Since  p  <  R,  there  exists  a  set  of  p  undirected  spanning  trees  in  Q,  which  represents  a  subgraph 
of  the  network  Q,  all  the  tree  edges  will  also  carry  randomly  coded  symbols  (without  explicit 
knowledge  of  the  trees).  As  a  result.  Theorem  2  continues  to  hold  for  Algorithm  3  by  substituting 
R  with  any  p  <  R. 

5  Byzantine  Broadcast  (BB)  with  Local  Linear  Coding 

In  this  section,  we  present  an  algorithm  for  Byzantine  Broadcast  in  G(V,  E),  with  node  1  as  the 
source,  tolerating  up  to  /  Byzantine  faults,  when  the  network  connectivity  is  at  least  2/  +  1,  and 
n  >  3/  +  1.  Connectivity  >  2/  +  1  implies  that  for  every  pair  of  nodes  Z  and  j  in  G,  after  removing 
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Algorithm  3  MEQ  with  parameter  p 

1.  For  every  unit-capacity  edge  e  =  (i,  j)  in  Q,  chooses  a  coefficient  vector  ce  = 

[  ce(  1),  ce( 2),  ■  ■  ■  ,  ce(p )  ]  containing  p  symbols,  each  chosen  uniformly  randomly  from  GF(2lsh}). 
Node  i  transmits  to  node  j  symbol  ye  =  cexj,  which  is  a  random  linear  combination  of  the 
symbols  in  xt.  ye  is  said  to  be  a  "coded  symbol". 

2.  At  each  node  j,  for  each  unit-capacity  link  e  =  (; i ,  j)  in  Q,  node  j  checks  whether  ye  equals 
cexT. .  The  check  is  said  to  fail  if  ye  y  cexT. .  (Recall  that  ye  is  cexj .) 

3.  At  each  node  j,  if  any  of  the  received  symbols  fails  the  check  in  step  2,  then  node  j  sets  its 
output  bit  to  1;  otherwise  node  j  sets  its  the  output  to  0. 


any  subset  of  2/  nodes,  there  is  still  a  positive  directed  mincut  from  i  to  j. 

5.1  Upper  Bounds  of  the  Capacity  Cbb(G)  of  BB 

We  first  prove  upper  bounds  on  the  capacity  of  the  Byzantine  Broadcast  in  network  G(V,  E)  as 
functions  of  cut  sets  in  the  following  two  classes  of  its  subgraphs: 

1.  subgraphs  of  G(V,E),  each  containing  n  -  f  nodes.  These  subgraphs  are  named 
Gi,  •  •  •  ,  G^  n  j .  For  each  Gjt,  let  be  the  spanning  tree  packing  number  of  G/c.  Define 

R*  =  min  IN. 

Then,  at  least  R*  undirected  spanning  trees  can  be  packed  in  each  G/c. 

2.  A  subgraph  in  the  second  class  is  obtained  as  follows:  We  will  say  that  edges  in  F  c  E  are 
"explainable"  if  there  exists  a  set  W  c  V  such  that  (i)  W  contains  at  most  /  nodes,  and  (ii) 
each  edge  in  F  is  incident  on  at  least  one  node  in  W.  Set  W  is  then  said  to  "explain  set  F". 
Consider  each  explainable  set  of  edges  F,.  Suppose  that  Wi,  •  •  •  ,  W^  are  all  the  subsets  of  V 
that  explain  edge  set  F,.  Subgraph  Hj  is  obtained  by  removing  edges  in  F,  from  E,  and  nodes 
in  0,1,  Wfc  from  V.  1  In  general.  Hi  above  may  or  may  not  contain  the  source  node  1.  We 
only  need  to  focus  on  those  H/s  that  do  contain  node  1.  We  rename  the  subgraphs  H/s  that 
contain  source  node  1  as  G'v  G'v  ■■■  ,  G'k ,  •  •  • . 

Theorem  3  In  a  point-to-point  network  G(V,E),  the  capacity  of  the  BB  (Cbb)  with  n°de  2  as  the  source 
satisfies  the  following  upper  bounds,  using  subgraphs  G %  and  G'k  defined  above. 

1.  Cbb  is  upper  bounded  by  the  capacity  of  MEQ  in  any  subgraph  G^,  that  is,  Cbb(G)  <  minct.  Cmeq(G]c); 

2.  Cbb  is  upper  bounded  by  the  capacity  of  broadcast  from  node  1  in  any  subgraph  G'k  in  the  absence  of 
failures,  that  is,  Cbb(G)  <  mine'  D(G',  1). 

Please  see  Section  2.3  for  definition  of  broadcast  capacity  D  used  above,  and  U  used  later  below. 
The  proof  of  Theorem  3  is  included  in  Appendix  E.  The  intuition  behind  the  first  condition  above 
is  that,  given  any  BB  algorithm  A  in  network  G  with  throughput  R,  algorithm  A  can  also  be  used 
to  solve  the  MEQ  problem  in  any  subgraph  Gjt  with  throughput  R;  hence  R  <  Cmeq(G^). 

The  intuition  for  the  second  condition  is  as  follows:  Suppose  that  subgraph  G'k  was  constructed 
using  an  explainable  edge  set  F;  (see  above).  Due  to  the  manner  in  which  subgraph  G'  is  defined, 
for  any  node  j  in  G'k,  there  must  exist  a  set  of  nodes  Wy  that  explains  F;  such  that  j  t  Wj.  Consider 
a  scenario  in  which  the  nodes  in  Wj  are  faulty,  and  all  other  nodes  are  fault-free.  Also  suppose 

Tt  is  possible  that  H,  for  different  i  may  be  identical.  This  does  not  affect  the  correctness  of  our  algorithm. 
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that  these  faulty  nodes  misbehave  by  refusing  to  communicate  with  neighbors  over  the  links 
in  Fj  (essentially  pretending  that  the  corresponding  neighbors  are  faulty),  but  behave  correctly 
otherwise.  Despite  such  misbehavior,  as  Appendix  E  elaborates,  node  j  must  still  be  able  to 
learn  the  value  that  source  node  1  (possibly  faulty)  is  broadcasting.  So  Cbb(G)  cannot  exceed  the 
broadcast  capacity  from  node  1  to  the  nodes  in  G',  that  is,  D(G'k,  1). 

Let  LT  =  mine;,.  U(Gk)  and  D*  =  minq  D(G',  1).  We  have  the  following  from  Theorems  1  and  3: 

Corollary  1  The  capacity  of  the  BB  problem  in  arbitrary  network  G  satisfies: 

Cbb(G)  <  min(lT, D*). 


5.2  BB  with  Local  Linear  Coding 

Using  Algorithm  3  presented  in  the  previous  section,  we  construct  a  BB  algorithm  that  achieves 
at  least  1/2  or  1/3  of  the  upper  bound  in  Corollary  1,  hence  at  least  1/2  or  1/3  of  Cbb(G),  depending 
on  the  relationship  between  certain  cuts  of  graph  G. 

Suppose  that  the  source  node  (node  1)  wants  to  broadcast  an  L-bit  value  x  using  our  BB 
algorithm.  This  L-bit  input  value  x  is  divided  into  L/B  parts  of  size  B  bits  each,  as  discussed  in 
Sections:overview.  These  parts  are  denoted  as  x(l),  ■  ■  ■  ,x(L/B).  Algorithm  for  L-bit  BB  consists  of 
L/B  sequential  executions  (generations)  of  Algorithm  4  presented  below  -  note  that  Algorithm  4 
is  a  more  detailed  version  of  Algorithm  1  in  Section  3.  For  the  g-th  generation  (1  <  g  <  L/B),  the 
source  node  1  uses  x(g)  as  its  input  in  Algorithm  4.  Each  generation  of  the  algorithm  results  in  all 
nodes  deciding  on  the  g-th  part  (namely  x'(g))  of  its  final  decision  value  x'. 

For  the  subsequent  discussion  in  this  section,  it  will  be  helpful  if  the  reader  has  read  Algorithm  4 
first.  The  proof  of  the  correctness  of  Algorithm  3  includes  proving  the  following  three  claims.  We 
will  argue  the  correctness  of  the  claims  later. 

•  Claim  1:  In  the  absence  of  misbehavior  by  nodes  in  G',  broadcast  in  G'  in  step  1  will  deliver 
data  from  node  1  to  other  peers  in  G'  at  rate  D* .  If  some  nodes  in  G'  misbehave,  then 
broadcast  may  be  received  incorrectly. 

•  Claim  2:  Performing  Algorithm  3  with  parameter  R*  over  graph  Q  -  G'  in  step  2  guarantees 
that 

(a)  If  no  failure  is  detected,  then  all  fault-free  nodes  must  have  identical  x,'s,  and  hence  BB 
for  the  current  generation  is  correct;  or 

(b)  If  failure  is  detected,  some  faulty  node  must  have  misbehaved.  In  this  case  fault  diagnosis 
is  performed. 

•  Claim  3:  During  fault  diagnosis  in  step  3,  at  least  one  edge  and/or  one  node  will  be  identified 
as  faulty  and  will  be  removed  from  G' .  All  fault-free  nodes  and  the  links  between  them 
remain  in  Gr  throughout  the  whole  algorithm. 

Now  we  argue  the  correctness  of  the  claims. 

Broadcast  (claim  1):  It  can  be  shown  that  G'  belongs  to  the  second  class  of  subgraphs  of  G 
defined  earlier  in  this  section,  and  all  the  fault-free  nodes  are  always  included  in  G'.  Recall  that 
D*  =  mine'  D(G',  1)  <  D(G',  1).  So  the  broadcast  from  node  1  in  G'  at  rate  D*  is  achievable  with 
simple  store-and-forward  routing  [7],  if  no  node  in  G'  misbehaves.  Data  received  by  a  fault-free 
peer  i  during  this  broadcast  is  named  x,. 
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Algorithm  4  BB  Algorithm  (generation  g) 

Let  G'  be  the  subgraph  of  G  obtained  by  removing  from  G  those  edges  and  nodes  that  have  been 
identified  as  faulty  during  the  Fault  Diagnosis  phase.  Graph  G'  evolves  with  time,  and  only  the 
nodes  in  G'  perform  the  algorithm  below.  It  can  be  shown  that  all  the  fault-free  nodes,  and  the 
links  between  them,  are  always  in  G'. 

1.  Broadcast :  Source  node  1  uses  a  traditional  store-and-forward  approach  to  broadcast  the 
g-th  part  of  its  input,  x(g),  to  the  nodes  in  subgraph  G',  at  rate  D*.  This  broadcast  is  not 
fault-tolerant.  Source  node  1  sets  X]  =  x(g),  and  each  peer  i  (z  ±  1)  sets  x,  equal  to  the  B  bits 
received  during  the  above  broadcast  (if  nothing  is  received,  then  x,  is  set  to  some  default 
value). 

2.  Failure  Detection :  Perform  Algorithm  3  with  parameter  R*  over  graph  Q  -  G'  and  x,'s  at 
the  nodes  in  G'  as  the  inputs  (we  will  justify  the  use  of  G'  and  R*  in  the  discussion  below). 

(a)  Each  node  in  G'  obtains  an  output  bit  from  the  MEQ  algorithm  execution.  Each  node 
in  G'  broadcasts  its  output  bit  using  a  traditional  1-bit  Byzantine  Broadcast  algorithm, 
denoted  as  Broadcast_Binary  (e.g.,  algorithm  in  [6,  3]  may  be  used). 

Failure  is  detected  if  the  output  bit  broadcast  by  any  of  the  nodes  in  G'  is  1. 

(b)  If  failure  is  not  detected,  every  fault-free  node  z  in  G'  decides  on  output  x'(g)  =  xu 
and  proceeds  to  the  next  generation.  Otherwise,  the  fault-free  nodes  perform  Fault 
Diagnosis. 

3.  Fault  Diagnosis:  Performed  only  when  failure  is  detected  during  Failure  Detection. 
Information  about  the  identity  of  the  faulty  node(s)  is  updated  using  "dispute  control" 
mechanism  [1],  During  Fault  Diagnosis,  node  1  once  again  broadcasts  X\  using  algorithm 
Broadcast_Binary.  By  the  end  of  Fault  Diagnosis,  additional  nodes  and/or  links  in  G'  may 
be  identified  as  faulty,  and  removed  from  G'  used  in  the  subsequent  generations. 

(a)  If  node  1  is  found  faulty  in  the  Fault  Diagnosis  step,  terminate  the  algorithm  with  a 
default  output. 

(b)  Otherwise,  every  fault-free  node  z  sets  x'(g)  equal  to  X\  broadcast  by  node  1  during 
Fault  Diagnosis,  and  proceeds  to  the  next  generation. 


Failure  Detection  (claim  2):  For  failure  detection,  our  objective  is  to  solve  the  MEQ  problem  in 
every  subgraph  G^  that  is  also  a  subgraph  of  G'  (recall  that  G^  is  in  the  first  class  of  subgraphs  of  G 
defined  earlier),  and  to  determine  whether  absence  of  equality  is  detected  during  at  least  one  of  the 
MEQ  executions.  A  naive  approach  for  this  is  to  run  the  MEQ  algorithm  independently  for  each 
of  these  subgraphs  G (recall  that,  each  C/c  considered  here  is  also  a  subgraph  of  G').  If  in  any  of 
these  executions,  a  node  in  one  of  the  subgraphs  Gjt  sets  its  output  to  1,  then  an  absence  of  equality 
would  be  detected.  However,  we  only  need  to  know  whether  equality  check  failed  in  one  of  these 
subgraphs;  identify  of  the  subgraph  is  not  required  to  be  known.  Due  to  this,  along  with  the  the 
local  communication  (only  between  adjacent  nodes)  used  in  MEQ  Algorithm  3,  we  can  achieve 
the  goal  more  efficiently.  In  particular,  we  can  apply  Algorithm  3  just  once  to  graph  G'  itself,  with 
parameter  p  for  the  algorithm  chosen  as  to  not  exceed  the  number  of  disjoint  undirected  spanning 
trees  in  any  of  the  subgraphs  C/<  that  are  included  in  G'.  In  particular,  we  choose  p  =  R*. 

By  a  simple  extension  of  Theorem  2,  it  can  be  shown  that  Algorithm  3  with  parameter  R*  can 
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detect,  with  a  non-zero  probability,  the  absence  of  equality  in  all  Gjt  that  are  subgraphs  of  G'. 

This  claim  follows  from  the  theorem  below.  Let  us  define  matrix  used  in  the  theorem 
first.  Each  subgraph  G*  contains  n  —  f  nodes.  For  each  subgraph  C/c  included  in  G',  let  us  pick 
any  set  of  R*  undirected  spanning  trees  of  G*  and  construct  a  square  matrix,  say  M(k>,  of  size 
(n  —  f  —  1  )R*,  similar  to  Mr  in  the  previous  section.  An  edge  being  reused  in  spanning  trees  of 
multiple  subgraphs  Gjt's  is  equivalent  to  the  corresponding  M(k)'s  having  one  row  in  common  (up 
to  permutation  of  node  indices).  The  theorem  below  (proof  included  in  Appendix  F)  states  that, 
for  sufficiently  large  B,  all  M(k>  matrices  can  be  made  invertible  simultaneously: 

Theorem  4  When  performing  Algorithm  3  with  parameter  p  =  R*  in  Q  =  G',  all  matrices  are 
invertible  simultaneously  with  probability  at  least  1  -  (n”y)(n  -  /  -  1)R* /2B^R‘ . 

It  can  be  shown  that  G'  always  contains  all  fault-free  nodes  as  well  as  the  links  between  then. 
So  the  union  of  all  such  subgraphs  includes  all  the  fault-free  nodes  and  the  links  between 
them.  As  noted  above,  if  the  inputs  (at  the  fault-free  nodes)  to  Algorithm  3  are  unequal,  then  the 
inequality  will  be  detected  by  Algorithm  3.  Alternatively,  it  is  also  possible  that  a  faulty  node  in 
G'  may  misbehave  leading  to  the  conclusion  the  inputs  are  unequal.  It  follows  that  if  the  output 
bits  broadcast  after  Failure  Detection  step  are  all  0,  then  all  fault-free  nodes  i  will  have  identical 
Xi  value,  and  hence  BB  of  this  generation  is  correct.  Otherwise,  some  node  must  have  misbehave 
sometime  during  the  current  generation  (including  possibly  node  1). 

Fault  Diagnosis  (claim  3):  The  Fault  Diagnosis  step  using  "dispute  control"  [1]  is  then  per¬ 
formed:  this  step  results  in  either  (a)  a  specific  node  as  being  identified  as  the  culprit  during 
generation  g,  or  (b)  the  identity  of  the  misbehaving  node  is  narrowed  down  to  a  pair  of  neighbor¬ 
ing  nodes  (that  is  connected  in  G'),  such  that  at  least  one  of  the  nodes  is  certain  to  be  faulty.  The 
link  between  this  pair  of  nodes  is  then  deemed  "faulty"  and  not  used  in  future  generations.  Also, 
due  to  the  use  of  "dispute  control",  fault  diagnosis  will  be  performed  for  at  most  /(/  +  1)  times 
throughout  the  whole  execution.  Please  see  Appendix  G  for  more  details. 

5.3  Throughput  of  Algorithm  4 

In  Appendix  H,  we  show  that  the  throughput  of  Algorithm  4  can  be  made  arbitrarily  close  to 
q,+r.  for  large  enough  B  and  L  =  Q(B).  Then  we  further  prove  the  following  theorem: 

Theorem  5  For  sufficiently  large  B  and  L  =  Q(B),  throughput  of  Algorithm  4  satisfies: 


Throughput  >  ^BB^) 

if  D*  <  R*} 

(4) 

Throughput  > 

if  D*  >  R*. 

(5) 

6  Conclusion 

We  prove  upper  bounds  on  the  capacity  of  Byzantine  Broadcast  in  general  point-to-point 
networks.  A  local  linear  coding  based  BB  algorithm  that  achieves  throughput  at  least  1/2  or  1/3  of 
the  capacity  Crb  in  general  point-to-point  networks  is  introduced.  To  the  best  of  our  knowledge 
this  is  the  first  result  for  the  BB  problem  that  achieves  a  constant  factor  of  capacity  for  general 
point-to-point  networks. 
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Appendices 


A  Proof  of  Theorem  1 

Suppose  the  cut  between  a  certain  set  s  c  *V  and  -  s  is  the  minimum  undirected  cut  in  Q, 
the  undirected  multigraph  representation  of  Q.  Now  consider  a  constrained  version  of  the  MEQ 
problem  wherein  all  nodes  in  s  are  have  an  identical  L-bit  input,  denoted  as  x,  and  all  nodes  in  *V— s 
have  another  identical  L-bit  input,  denoted  as  y.  If  we  ignore  the  cost  of  "internal"  communication 
among  the  nodes  in  set  s,  and  similarly,  ignore  the  cost  of  internal  communication  among  the  nodes 
in  *V  —  s,  then  solving  the  MEQ  problem  with  such  constrained  distribution  of  inputs  is  equivalent 
to  solving  the  2-party  equality  problem:  Alice  and  Bob  each  is  given  input  x  and  y,  respectively, 
need  to  check  if  x  =  y  by  communicating  with  each  other.  It  has  been  proved  that  at  least  L  bits 
must  be  communicated  between  Alice  and  Bob,  in  the  worst  case,  to  solve  2-party  equality  for  L-bit 
values.  It  then  follows  that  even  if  the  inputs  at  the  nodes  in  Q  are  constrained  as  described  above, 
the  execution  time  t{Q,L,A)  must  satisfy  U{@)t(Q,L,A)  >  L  for  any  L  and  any  MEQ  algorithm  A. 
This  implies  that  Cmeq(@)  ^  U(@). 


B  Proof  of  Theorem  2 


Proof:  We  now  show  that  each  Mk  is  invertible  with  probability  at  least  (l  -  jm)  •  The  proof  is 
done  by  induction,  with  k  =  1  being  the  base  case. 

Base  Case  -  k  =  1: 

Ma  =  CltlAx  (6) 

As  showed  in  Appendix  C,  Ak  is  always  invertible  and  det (Ak)  =  ±1.  Since  C\r\  is  a  (m  -  1)- 
by-(m  -  1)  diagonal  matrix,  it  is  invertible  provided  that  all  its  m  -  1  diagonal  elements  are 
non-zero.  Remember  that  the  diagonal  elements  of  C\r\  are  chosen  uniformly  and  independently 

from  GF(2b/r).  The  probability  that  they  are  all  non-zero  is  (l  -  >  1  -  ym- 

Induction  Step  -  k  to  k  +  1  <  R:  The  (m  -  1  )(k  +  l)-by-(7?z  -  1  )(k  +  1)  matrix  Mfc+i  can  be  written  as 


Mk+ 1  - 


Mk 


Dk 

Ck+l.k+lAk+i 


where 


’  k+lr 


rAj^  Ck/k+\j 

'  /  Q+1,fcdfc+l) 


Dk  -  (A[Cu+i,A2TC2, 
is  an  ( m  -  l)k-by-(m  -  1)  matrix,  and 

Fk  =  (Cfc+iqAjt+i/Cfc+i^Ajt+i, 

is  an  ( m  -  l)-by -(m  -  1  )k  matrix. 

Assuming  that  Mk  is  invertible,  we  transform  Mk+ \  as  follows: 

0 

I  A/I,  ,  I 

-FkM~ 

h 

-FkMk 

Mk  F)kAk+1 


(7) 


(8) 


(9) 


M' 


k+l 


L  )k  0 

f  I(m- 1) 

(m-l)k  0 

I(m- 1) 


Mk+ 1 


I(m—l)k 

0 


Mk  Dk 
Fk  Ck+irk+iAk+1 


(m-l)fc 

0 


Ar\ 

k+l 


l-l 


0  Qc+U+1  -FkM^DkA^ 


(10) 

(11) 

(12) 
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Here  Iq  denotes  a  q-by-q  identity  matrix.  Note  that  |  det(M'_|  )|  =  |  det(M/i+1  )|,  since  the  matrix 
multiplied  at  the  left  has  determinant  1,  and  the  matrix  multiplied  at  the  right  has  determinant  +1. 

Observe  that  the  diagonal  elements  of  the  ( m  -  l)-by -{m  - 1)  diagonal  matrix  Cfc+ijt+i  are  chosen 
independently  from  1  b)kAf  1  .  Then  it  can  be  proved  that  Q+i^+i  -  FkMk 1  DkAf+.{  is  invertible 

with  probability  at  least  1  -  (See  Appendix  D.)  given  that  Mk  is  invertible,  which  happens 

(\k 

1  -  j  according  to  the  induction  assumption.  So  we  have 


Pr{Mjc+i  is  invertible}  >  1  - 


m 


2b/r 


1  - 


m 


2b/r 


H- 


m 


.  k+1 


2 B/R  ) 


This  completes  the  induction.  Now  we  can  see  that  Mr  is  invertible  with  probability 


> 


(l-— ) 

\  2B/R  I 


>  1 


(m  -  1)R 

2b/r 


1,  as  L  — >  oo. 


(13) 


(14) 

□ 


C  Proof  of  Ak  being  Invertible 

Observe  that,  for  edges  incident  on  nodes  1,  the  corresponding  rows  have  exactly  one  non-zero 
entry.  Also,  the  row  corresponding  to  an  edge  that  is  incident  to  node  i  has  a  non-zero  entry  in 
column  i.  Since  there  must  be  at  least  one  edge  in  Tk  that  is  incident  on  node  n,  there  must  be 
at  least  one  row  of  Ak  that  has  only  one  non-zero  element.  Also,  since  every  node  is  incident  to 
at  least  one  edge  in  Tk,  every  column  of  Ak  has  at  least  one  non-zero  element(s).  Since  there  is  at 
most  one  edge  between  every  pair  of  nodes  in  Tk,  no  two  rows  are  non-zero  in  identical  columns. 
Therefore,  by  row  manipulation,  we  can  transform  matrix  Ak  into  another  matrix  in  which  every 
row  and  every  column  has  exactly  one  non-zero  element.  Hence  det(/\/c)  equals  to  either  1  or  -1, 
and  Ak  is  invertible. 


D  Proof  of  Ck+i'k+i  ~  1DkAk+1  being  Invertible 

Consider  Q  be  an  arbitrary  fixed  q-by-q  matrix.  Consider  a  random  q-by-q  diagonal  matrix  C 
with  m  diagonal  elements  C\,  ■  •  •  ,cq. 


ci  0  •••  O' 

0  C2  • • •  0 


0  •••  0  cq  ) 


(15) 


The  diagonal  elements  of  C  are  selected  independently  and  uniformly  randomly  from  GF(2P). 
Then  we  have: 


Theorem  6  The  probability  that  the  q-by-q  matrix  C  -  Qis  invertible  is  lower  bounded  by: 


Pr{(C  -  Q)  is  invertible}  >  1  -  — . 


Proof:  Consider  the  determinant  of  matrix  C  -  A. 


det(C  -  Q) 


det 


(16) 


(c  -  Qi,i) 

-Ql,2 

~Ql,q 

-Q2,l  (C2 

~  0,2,2) 

-Q2,q 

(17) 

~Qcj,  i 

~Qq,q-l 

(Cm  ~  Qq,q)  , 

Ql,l)(c2  -  Q2,2)  • 

'  '  (cq  ~  Qq,q )  + 

other  terms 

(18) 

'i  +  Qq- 

(19) 
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The  first  term  above,  TI^p,  is  a  degr ee-q  polynomial  of  C\,  ■  ■  ■  ,  cq.  Qq-  is  a  polynomial  of  degree  at 
most  q  -  1  of  Ci,  •  •  •  ,  cCj,  and  it  represents  the  remaining  terms  in  det(C  -  Q).  Notice  that  det(C  -  Q) 
cannot  be  identically  zero  since  it  contains  only  one  degree  q  term.  Then  by  the  Schwartz-Zippel 
Theorem,  the  probability  that  det(C  -  Q)  =  0  is  <  q/ 2p.  Since  C  -  Q  is  invertible  if  and  only  if 
det(C  -  Q)  ^  0,  we  conclude  that 

Pr{(C  -  A)  is  invertible}  >  1  —  ^  (20) 

By  setting  C  =  Ck+ u+1,  Q  =  q  =  m  -  1  and  p  =  B/R,  we  proof  that  Cfc+U+1  - 

is  invertible  with  probability  at  least  1  -  □ 

E  Proof  of  Theorem  3 

In  arbitrary  point-to-point  network  G(V,  E),  the  capacity  of  the  BB  problem  with  node  1  being 
the  source  and  up  to  /  <  n/3  faults  satisfies  the  following  upper  bounds: 

E.l  First  Condition 

Cbb  is  upper  bounded  by  the  capacity  of  MEQ  in  any  subgraph  Gk,  that  is, 

Cbb(G)  <  min  Cmeq(G^); 

Gk 


Proof:  Given  any  subgraph  Gk  of  G  with  size  n  -/,  let  us  rename  the  nodes  in  Gk  as  g\,  gz,  •  •  •  ,  gn-u 
and  the  nodes  not  in  G,  as  b\, b2,  ■  ■  ■  ,bf. 

We  first  discuss  the  case  when  the  source  of  the  Byzantine  broadcast  problem  is  not  in  Gk. 
Without  loss  of  generality,  we  can  assume  that  b\  is  the  source. 

Given  any  algorithm,  namely  A,  that  solves  BB  in  network  G,  with  node  b\  as  the  source,  with 
at  most  /  failures,  and  achieves  throughput  R,  in  the  following,  we  construct  a  protocol  A'  that 
solves  MEQ  in  Gk  with  throughput  R  as  follows.  For  the  MEQ  problem,  let  us  assume  that  Xj  is 
the  input  value  at  node  g,.  Thus,  the  goal  is  to  determine  whether  x,  is  identical  at  all  gi  e  Gk.  A' 
is  constructed  as  follows: 

1.  Every  node  gi  6  Gk  creates  a  local  virtual  network  as  follows: 

(a)  It  creates  one  virtual  node  gjq  for  each  gj  e  Gk/  j  F  i.  Similarly,  it  creates  one  virtual 
node  biri  for  each  fc>/  t  Gk.  Node  g,  also  includes  itself  in  the  local  virtual  network. 

(b)  Every  pair  of  virtual  nodes  are  connected  with  a  pair  of  links  of  the  same  capacity  as 
the  ones  that  connects  the  corresponding  pair  of  actual  nodes  in  the  original  network 
G.  In  other  words,  link  (gg,  gij)  has  the  same  capacity  as  link  (gj,  gi)  in  G.  Similarly,  link 
(gjri,  bjj)  has  the  same  capacity  as  link  (gj,  bf)  in  G. 

(c)  Node  gj  connects  itself  with  each  of  the  virtual  nodes  b[j  such  that  link  (gj,  bj  j)  has  the 
same  capacity  as  link  (gu  bj)  in  G,  and  link  (/y„  gj)  has  the  same  capacity  as  link  (bj,  gj) 
in  G. 

(d)  Node  gj  connects  itself  with  each  of  the  virtual  nodes  gjA  with  one  link  (gj,  gjj)  that  has 
the  same  capacity  as  link  (gj,  gj)  in  G.  There  is  no  link  from  virtual  node  gjj  to  node  g,. 

(e)  Every  virtual  node  is  assigned  with  the  same  code  that  the  corresponding  actual  node 
should  run  in  algorithm  A.  In  other  words,  virtual  node  bjj  is  assigned  the  execution 
code  that  node  bj  should  run  in  A.  Virtual  node  gjrj  is  assigned  the  execution  code  that 
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node  gj  should  run  in  algorithm  A,  except  that  it  drops  the  messages  that  should  be  sent 
to  node  g,  (since  there  is  no  link  from  virtual  node  gh,  to  node  gi). 

2.  Every  node  gi  €  Gk  executes  correctly  as  specified  by  algorithm  A.  When  algorithm  A 
specifies  that  gi  should  send  a  message  to  an  actual  node  gj  €  Gk,  gi  sends  the  message  to 
both  the  actual  node  gj  and  the  virtual  node  gjA.  When  it  should  send  a  message  to  node 
bj  g  Gk  according  to  algorithm  P,  gj  sends  the  message  only  to  the  virtual  node  b[A.  When 
it  receives  an  message  from  virtual  node  b[A,  it  pretends  that  the  message  is  received  from 
the  actual  node  bj.  Since  there  is  no  link  from  virtual  node  gjA  to  node  gi,  gi  will  not  receive 
messages  from  gjA. 

3.  Recall  that  node  b\  is  the  source  nodes  for  the  broadcast  being  performed  by  algorithm  A. 
Thus,  each  virtual  node  b\A  should  be  given  an  input  value  for  algorithm  A.  Each  node 
gi  €  Gk  sets  the  input  value  at  node  b\A  equal  to  x,  (recall  that  x,  is  the  input  for  the  MEQ 
operation  at  node  gi).  Thus,  algorithm  A  for  node  b\A  will  have  input  x;.  Then,  all  the  nodes 
in  the  network  perform  their  part  of  algorithm  A,  with  each  node  gj  simulating  the  behavior 
of  the  corresponding  virtual  nodes. 

4.  As  we  will  see  later,  algorithm  A  will  eventually  terminate  (at  all  nodes,  including  the  virtual 
nodes).  When  algorithm  A  terminates,  every  node  gi  e  Gk  obtains  an  output  value  x'.  Each 
node  gj  sets  its  output  for  the  MEQ  problem  to  0  if  x,  =  x',  and  to  1  otherwise. 

Figure  2  illustrates  the  construction  of  an  MEQ  protocol  P  for  3  nodes,  g\,  g2  and  gs,  with  a 
Byzantine  broadcast  algorithm  A  for  4  nodes,  and  at  most  1  failure.  The  gray  areas  indicate  the 
virtual  networks  created  by  nodes  g\,  g2  and  gj. 

Observe  that,  from  the  perspective  of  nodes  g\,-  ■  ■  ,gn-f,  what  they  see  from  the  execution 
of  A'  is  the  same  as  what  they  would  have  seen  from  an  execution  of  A  when  node  b /  is  faulty 
(1  <  l  <  f)  and  behaves  like  bjA  to  node  gi.  Since  algorithm  A  solves  BB  with  up  to  /  failures  in 
G,  it  will  terminate  in  the  above  execution  as  well,  and  each  node  gi  e  Gk  will  obtain  an  identical 
output  value  x'.  =  x  for  some  value  x.  The  exact  value  of  x  will  depend  on  the  inputs  xy. 

Now  consider  two  cases: 

•  X\  —  %2  —  ■  ■  ■  —  xn_f  =  z  (input  to  the  MEQ  problem  at  each  node  in  G,-  is  equal  to  z):  In  this 
case,  observe  that  all  the  simulated  sources  nodes  bjA  will  have  identical  input,  say,  equal  to  z. 
It  should  be  easy  to  see  that  the  behavior  of  nodes  in  Gk  will  then  be  identical  to  the  behavior 
as  in  network  G  wherein  node  b\  has  input  z,  with  all  the  nodes  behaving  correctly.  Then  the 
output  value  x  from  A  must  equal  to  z  for  all  gi  €  Gk,  and  hence  all  the  nodes  in  Gk  will  set 
their  outputs  for  the  MEQ  problem  to  0  correctly  (since  X;  =  z). 

•  3 gi,  gj  e  Gk  s.t.  x,  g  Xj  (the  inputs  for  the  MEQ  problem  at  the  nodes  in  G/c  are  not  identical): 
In  this  case  as  well,  as  noted  above,  the  output  x  at  all  the  nodes  in  Gk  is  identical  by  the 
definition  of  algorithm  A.  However,  x;  +  Xj,  it  follows  that,  x  must  be  different  from  at  least 
one  of  Xj  and  xy.  Without  loss  of  generality,  assume  that  x;  +  x.  Then  node  gi  will  set  its 
output  for  the  MEQ  problem  to  1,  and  inequality  of  the  inputs  will  be  correctly  detected. 

Now  we  can  conclude  that,  given  any  algorithm  A  that  solves  BB  in  G  at  some  throughput  R, 
we  can  construct  a  protocol  A'  that  solves  the  MEQ  problem  in  Gk  at  the  same  throughput  R,  when 
the  source  is  not  in  Gk ■  This  implies  that  Cbb(G)  <  Cmeq(G/(). 

The  discussion  when  the  source,  namely  gi,  is  in  Gk  is  almost  the  same.  We  can  construct  the 
virtual  network  for  each  gi  in  the  same  way  as  described  above,  with  the  following  modifications: 
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Figure  2:  Solving  MEQ  in  3  nodes  with  Byzantine  broadcast  algorithm  for  4  nodes. 


(1)  node  g\  sets  X\,  its  input  for  the  MEQ  problem,  as  the  value  that  it  will  broadcast  using  algorithm 
P;  and  (2)  node  gi  gi  sets  xt  as  the  initial  input  at  node  g\j,  i.e.,  the  virtual  node  corresponding 
to  node  g\.  Then  we  can  prove  that  Cbb(G)  <  Cmeq(Gi()  when  the  source  is  in  C/c  using  the  same 
argument  above. 

□ 

E.2  Second  Condition 

Cbb  is  upper  bounded  by  the  capacity  of  broadcast  from  node  1  in  any  subgraph  G'  in  the  absence 
of  failures,  that  is, 

Cbb(G)  <  minD(G'k,l). 

Gk 


Proof:  Consider  any  G'k  and  let  F/c  6  E  be  the  corresponding  set  of  edges  that  are  removed  to 
construct  G' .  By  the  construction  of  G'k,  there  must  be  at  least  one  set  W  c  V  that  explains  F/c  and 
does  not  contain  the  source  node  1,  otherwise  node  1  must  have  been  removed.  Without  loss  of 
generality,  let  Wi  be  such  a  set  that  does  not  contain  the  source  node.  We  are  going  to  show  that 
Cbb(G)  <  MINCUT(G'k,  1,  i)  for  every  peer  node  i  that  is  in  G'k. 

First  consider  any  peer  node  i  in  G'k  but  i  Wi.  Let  all  the  nodes  in  Wi  be  faulty  such  that 
they  refuse  to  communicate  over  edges  in  F/.,  but  otherwise  behave  correctly.  In  this  case,  since 
the  source  is  fault-free,  node  i  must  be  able  to  receive  the  value  node  1  is  trying  to  broadcast.  So 
CBB(G)<MINCUT(G'k,l,i). 

Now  consider  a  peer  node  i  in  G'  and  i  €  Wi .  Notice  that  in  this  case  i  <£  W/,  otherwise  node 

i  must  have  been  removed.  If  there  exists  a  set  Wj  c  V  that  explains  F/c  and  contains  neither  node 
1  nor  node  i,  then  following  the  same  argument  above,  Cbb(G)  <  MINCUT(G'k,l,i).  Otherwise, 
there  must  exist  a  set  W;-  that  contains  node  1  but  not  node  i,  given  that  node  i  was  not  removed, 
i.e.,  i  i  HIV,  W/.  Without  loss  of  generality,  let  W2  be  such  a  set.  Define  V~  =  V  -  \N\  -  W2.  V~ 
is  not  empty  since  Wj  and  W2  both  contain  at  most  /  nodes  and  there  are  n  >  3/  +  1  nodes  in  V. 
Consider  two  scenarios:  (1)  nodes  in  W\  are  faulty  and  refuse  to  communicate  over  edges  in  F/c; 
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and  (2)  nodes  in  W2  are  faulty  and  refuse  to  communicate  over  edges  in  F/c.  Observe  that  among 
edges  between  nodes  in  V~  and  IV]  U  W2,  only  edges  between  V~  and  W\  n  IV2  could  have  been 
removed,  because  otherwise  F k  cannot  be  explained  by  both  IV]  and  IV2.  So  nodes  in  V~  cannot 
distinguish  between  these  two  scenarios.  In  scenario  (1),  the  source  (node  1)  is  not  faulty.  Hence 
nodes  in  V~  must  agree  with  the  value  x  that  node  1  is  trying  broadcast,  according  to  the  validity 
condition.  Since  nodes  in  V~  cannot  distinguish  between  the  two  scenarios,  they  must  also  set  their 
outputs  to  x  in  scenario  2,  even  though  in  this  case  the  source  (node  1)  is  faulty.  Then  according 
to  the  agreement  condition,  node  i  must  agree  with  nodes  in  V~  in  scenario  2,  which  means  that 
node  i  also  learn  x.  So  Cbb(G)  <  MINCUT(G'k,  1,  i ). 

Now  we  can  conclude  that  Cbb(G)  <  D(C',  1),  and  the  theorem  follows. 

□ 


F  Proof  of  Theorem  4 

Theorem  4  When  performing  Algorithm  3  with  parameter  p  =  R*  in  Q  =  G ,  all  matrices  are 
invertible  simultaneously  with  probability  at  least  1  -  (n”y)(n  -  /  -  1)R* /2B^R‘ . 

Proof:  Let  K  be  the  number  of  subgraphs  Gf  s  included  in  G'.  Without  less  of  generality,  rename 
these  subgraphs  as  Gi,  •  •  •  ,  Gk ■  Let 

K 

m*  =  Y[  M°(> 

k=l 

.  Since  R*  <  R^,  according  to  Theorem  2,  each  M{k)  matrix  (1  <  k  <  K)  is  invertible  with  non-zero 
probability.  It  implies  that  det(M^)  is  a  not-identically-zero  polynomial  of  the  random  coefficients 
of  degree  at  most  (n  -  f  -  1  )R*  (since  M<k>  is  an  square  matrix  of  size  (n  -  f  -  1  )R*).  So 

K 

det  (M*)  =  [|det(M(,c)) 

k=l 

is  a  non-identically-zero  polynomial  of  the  random  coefficients  of  degree  at  most  K(n  -  f  -  1  )R*. 
Notice  that  each  coded  symbol  is  used  once  in  each  subgraph  Gk-  So  each  random  coefficient 
appears  in  at  most  one  row  in  each  M®.  It  follows  that  the  largest  exponent  of  any  random 
coefficient  in  det  (M*)  is  at  most  K. 

According  to  Lemma  1  of  [10],  a  non-identically-zero  polynomial  of  degree  no  more  than  dv, 
in  which  the  largest  exponent  of  any  variable  is  at  most  d,  equals  zero  with  probability  at  most 
1  -  (1  —  d lq)v  for  d  <  q,  where  q  is  the  size  of  the  field  from  which  each  variable  is  chosen.  In  our 
case,  d  =  K,  v  =  (n  -  f  -  1  )R*,  and  q  =  2B^R'  ,  so  the  probability  that  det  (M*)  is  non-zero  is  at  least 

(l  -  K/2B/R'){U~f~1)R  >  1  -  K(n  -  f  -  1)R72b/r*. 

Since  G'  is  a  subgraph  of  G,  and  there  are  at  most  (n\)  subgraphs  Gfs  in  G,  K  <  (,,'lf).  Then  the 
theorem  follows.  □ 

G  Fault  Diagnosis 

For  fault  diagnosis,  every  node  i  in  G'  broadcasts  everything  it  has  received  and  sent  in 
Broadcast  and  Failure  Detection  phases  during  the  current  generation  with  Broadcast_Binary. 
Due  to  the  used  of  Broadcast_Binary,  all  fault-free  nodes  share  the  same  information  about  what 


18 


each  node  claims  it  has  received  and  sent.  We  will  show  that,  by  comparing  this  information,  at 
least  one  new  pair  of  adjacent  nodes,  at  least  one  of  which  must  be  faulty,  will  be  identified. 

First  notice  that  everything  a  fault-free  node  sends  in  Broadcast  and  Failure  Detection 
phases  is  a  function  of  the  information  it  receives,  as  specified  by  Algorithm  4.  So  if  what  a  node  i 
claims  it  has  sent  is  inconsistent  with  what  it  claims  to  have  received:  (1)  it  claims  to  have  received 
a  symbol  y  during  Broadcast  phase,  but  it  claims  to  have  forwarded  a  different  y'  instead  of  y  as 
it  should  have;  or  (2)  the  coded  symbols  it  claims  to  have  sent  during  Failure  Detection  are  not  a 
valid  linear  combination  of  xt;  or  (3)  the  output  bit  of  Algorithm  3  contradicts  to  and  the  coded 
symbols  it  claims  to  have  received,  then  node  i  must  be  faulty.  Then  faulty  node  i  will  be  removed 
from  G',  and  all  its  adjacent  edges  will  be  removed  as  well. 

Now  consider  the  case  when  every  faulty  node  are  "smart"  enough  so  that  what  it  claims  to 
have  sent  is  consistent  with  what  it  claims  to  have  received.  Then  there  must  exist  at  least  a  pair 
of  nodes  i  and  j  such  that  what  node  i  claims  to  have  sent  to  node  j  contradicts  with  what  node  j 
claims  to  have  received  from  node  i.  Suppose  on  the  contrary  that  there  is  no  contradicting  claims 
between  two  nodes,  then  the  claims  from  all  peers  are  all  consistent  with  the  x(g)  that  node  1  claims 
it  has  sent.  If  so,  then  all  output  bits  of  Algorithm  3  from  fault-free  nodes  should  be  0.  Since  failure 
has  been  detected,  at  least  one  node  k  in  G'  must  have  set  its  output  bit  as  1.  Realize  that  this  fact 
contradicts  with  x \  and  the  coded  symbols  node  k  claims  to  have  received,  which  contradicts  with 
the  assumption  that  every  faulty  node  are  "smart"  enough  so  that  what  it  claims  to  have  sent  is 
consistent  with  what  it  claims  to  have  received.  This  completes  the  argument  that  at  least  one  link 
in  G'  will  be  identified  as  guilty  and  will  be  removed. 

It  is  easy  to  see  that  the  claims  of  any  two  fault-free  nodes  never  contradict,  so  at  least  one  of 
the  two  nodes  adjacent  to  a  faulty  link  must  be  faulty. 

After  removing  edges  that  are  found  faulty  in  G',  we  try  to  identify  additional  faulty  nodes 
by  applying  the  operation  to  construct  the  second  class  of  subgraphs  of  G  as  described  at  the 
beginning  of  this  section  5  G':  find  all  subset  TV] ,  •  •  •  ,  c  V  such  that  every  contains  no  more 
than  /  nodes  and  explains  all  edges  that  have  been  removed  so  far.  Given  that  only  edges  adjacent 
to  a  faulty  node  can  be  removed,  each  W '*•  represents  a  potentially  set  of  no  more  than  /  faulty 
nodes  that  explains  the  removed  edges.  So  if  some  node  i  e  fjf=i  JA4,  it  must  be  faulty.  This  part 
is  similar  to  the  work  in  system-level  diagnosis  [24,  20]. 


H  Throughput  of  Algorithm  4 

Consider  the  time  cost  of  each  operation  of  Algorithm  4: 


•  Step  1:  At  most  B /D*  per  generation  since  the  broadcast  from  the  source  node  1  at  rate  D*  is 
achievable,  as  previously  discussed.  So  total  cost  is  L/B  x  B/D *  =  L/D*. 

•  Step  2:  At  most  B/R *  per  generation  as  discussed.  So  total  cost  is  L/B  X  B/R *  =  L/R*. 

•  Step  2(a):  The  per-generation  cost  for  broadcasting  the  output  bits  of  MEQ  with  Broadcast_Binary 
is  some  constant  a  independent  of  B  and  L.  So  total  cost  is  aL/B. 


•  Step  3:  Each  time  Fault  Diagnosis  is  performed,  at  most  j 3B  bits  are  being  broadcast 
with  Broadcast  J3inary  for  some  constant  jJ>.  So  the  time  cost  is  yB  for  some  constant  y 
independent  of  B  and  L.  As  discussed  previously,  diagnosis  is  performed  at  most  /(/  +  1) 
times.  So  the  total  cost  is  /(/  +  1  )yB. 

Then  we  can  computed  the  throughput  of  Algorithm  4  as 


TPT  = - 

EF  +  F  +  ak  +  /(/  +  1)Tb 


_ D*ZV _ 

D*  +  R*  +  a^f  +  /(/  +  l)y^P 


(21) 
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Notice  that  D*,  R*,  f,  a  and  y  are  independent  of  B  and  L.  By  choosing  sufficiently  large  B  and 
L  =  Q(B),  the  last  two  terms  in  the  denominator  can  be  made  arbitrarily  close  to  0.  So  TPT  can  be 
made  arbitrarily  close  to 


D*R* 
D*  +  R*' 


(22) 


Recall  that  for  each  Gjt,  we  have  fi(G/c)/2  <  Rk  <  U(Gfr).  Since  R*  =  min(;;  Rk  and  U*  = 
mine;,  li(Gfc),  it  follows  that  IT/2  <  R*  <  U*.  Also  recall  that  Cbb(G)  <  min  (IT,  D*),  then 


1.  When  D*  <  R*:  Observe  that  TPT  is  an  increasing  function  of  IT*.  It  is  minimized  when  IT*  is 
minimized.  So 

d*2 

TPT  >  =  0  /2  -  «G)/2.  (23) 

The  last  inequality  is  due  to  D*  >  Cbb(G). 

2.  When  R*  <  D*  <  IT: 


TPT  =  D* 


R* 


D*  +  R* 


>  D* 


R* 


U*  +  R* 


>  Cbb(G)/3. 


(24) 


The  last  inequality  is  due  to  D*  >  Cbb(G)  and  U*  <  2 R*. 

3.  When  R*  <  U*  <  D*:  Observe  that  TPT  is  an  increasing  function  of  D*.  It  is  minimized  when 
D*  is  minimized.  So 


TpT  >  >  Cbb(G)/3.  (25) 

The  second  inequality  is  due  to  U*  >  Cbb(G)  and  U*  <  2 R*. 

In  the  above  discussion,  we  implicitly  assumed  that  transmissions  during  the  Broadcast  stage 
accomplish  all  at  the  same  time.  However,  in  reality,  it  may  require  transmissions  over  multiple 
hops.  A  node  cannot  forward  a  message/packet  until  it  receives  it.  So  for  each  generation,  the 
information  broadcast  by  the  source  propagates  only  one  hop  every  B/D*  time  units.  So  for  a  large 
network,  the  "time  span"  of  the  Broadcast  stage  can  be  much  larger  than  B/D*.  This  problem  can 
be  solved  by  pipelining:  We  divide  the  time  horizon  into  rounds  of  B/D *  +  B/R*  time  units.  For 
each  generation  g,  x(g)  from  the  source  node  1  propagates  one  hop  per  round,  using  the  first  B /D* 
time  units,  until  Broadcast  completes.  Then  the  remaining  B/R*  time  units  of  the  last  round  is 
used  to  perform  Failure  Detection.  An  example  in  which  Broadcast  takes  3  hops  is  shown  in 
Figure  3.  By  pipelining,  we  achieve  the  throughput  computed  above. 
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|  |  B/D*  for  line  1  |  |  B/R*  for  line  3 


Figure  3:  Example  of  pipelining 
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